-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): support replicated LocalHummockStorage
#10226
Conversation
c9793ff
to
21466a0
Compare
LocalHummockStorage
with no_upload
LocalHummockStorage
with no_upload
flag
LocalHummockStorage
with no_upload
flagno_upload
flag in LocalHummockStorage
Codecov Report
@@ Coverage Diff @@
## main #10226 +/- ##
==========================================
+ Coverage 70.52% 70.55% +0.02%
==========================================
Files 1244 1244
Lines 213316 213517 +201
==========================================
+ Hits 150435 150638 +203
+ Misses 62881 62879 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if we read from state table (with underlying local state store), we only read the ReadVersion
associated to it, and if we read from storage table (with underlying global state store), it will read all ReadVersion
s.
Before merging this PR, we should decide whether the read from the storage table can read the data from the replicated ReadVersion
s. If allowed, later we should carefully ensure that only the data of vnode from other CNs are replicated to the current CN. If not allowed, in this PR we should distinguish whether a ReadVersion
is replicated or not, and only return the not replicated ReadVersion
s to handle global state store read.
no_upload
flag in LocalHummockStorage
is_replicated
flag in LocalHummockStorage
is_replicated
flag in LocalHummockStorage
LocalHummockStorage
Can we add test cases to ensure that 1) replicated ReadVersion can be seen by the corresponding LocalStateStore (for streaming read) and 2) cannot be seen by HummockStorage (for batch read)? |
019a9a3
to
95f9337
Compare
.filter(|v| !v.read_arc().is_replicated()) | ||
.cloned() | ||
.collect_vec() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filters out replicated read versions, when reading from global state store.
Added them in 95f9337. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@chenzl25 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
commit ec637af4f5458b1a951d591a3dd7fc6994192e8f Author: Little-Wallace <[email protected]> Date: Tue Jun 20 12:52:47 2023 +0800 fix config Signed-off-by: Little-Wallace <[email protected]> commit 14641c2 Author: Little-Wallace <[email protected]> Date: Mon Jun 19 20:47:43 2023 +0800 fix config Signed-off-by: Little-Wallace <[email protected]> commit bc252ee Author: Little-Wallace <[email protected]> Date: Mon Jun 19 20:10:51 2023 +0800 fix busy loop Signed-off-by: Little-Wallace <[email protected]> commit 5b816a6 Merge: 1059c15 02dfee5 Author: Wallace <[email protected]> Date: Mon Jun 19 13:59:04 2023 +0800 Merge branch 'main' into scheduler-split commit 02dfee5 Author: William Wen <[email protected]> Date: Mon Jun 19 13:52:03 2023 +0800 feat(log-store): implement a merge stream of kv-log-store (#10090) commit a6c9c39 Author: lmatz <[email protected]> Date: Mon Jun 19 13:28:28 2023 +0800 chore: use github action to auto cherry pick pr to release branch (#10383) commit 608e183 Author: Bohan Zhang <[email protected]> Date: Mon Jun 19 12:18:28 2023 +0800 fix: support variable scale decimal in avro (#10368) Co-authored-by: idx0-dev <[email protected]> commit 75f6025 Author: zwang28 <[email protected]> Date: Sun Jun 18 17:41:15 2023 +0800 feat(trace): enable await tree trace for compactor (#10381) commit 321d376 Author: wu <[email protected]> Date: Sun Jun 18 15:59:38 2023 +0800 feat(connector): sink support for elasticsearch (#10357) commit d13d862 Author: Eric Fu <[email protected]> Date: Sun Jun 18 00:26:47 2023 +0800 feat: add debug profile tools in docker image (#10380) commit 1059c15 Merge: 9ac9ed4 d26f4bb Author: Wallace <[email protected]> Date: Fri Jun 16 20:49:21 2023 +0800 Merge branch 'main' into scheduler-split commit d26f4bb Author: Yuhao Su <[email protected]> Date: Fri Jun 16 18:36:27 2023 +0800 feat(metrics): add metrics for the evicted watermark for each executors (#10379) commit 3dd1393 Author: William Wen <[email protected]> Date: Fri Jun 16 17:34:34 2023 +0800 feat(sink): enable delta lake sink (#10374) commit 9ac9ed4 Merge: 58d8562 5c6b25c Author: Wallace <[email protected]> Date: Fri Jun 16 17:08:38 2023 +0800 Merge branch 'main' into scheduler-split commit 7b66d55 Author: William Wen <[email protected]> Date: Fri Jun 16 16:49:57 2023 +0800 fix(docker): install sasl library in docker (#10365) Co-authored-by: Eric Fu <[email protected]> commit 5c6b25c Author: zwang28 <[email protected]> Date: Fri Jun 16 16:10:23 2023 +0800 feat(ctl): list serving fragment mappings (#10331) commit 2c2a2b7 Author: Renjie Liu <[email protected]> Date: Fri Jun 16 15:49:00 2023 +0800 fix: Memory counter leak (#10358) commit 1c1354c Author: lmatz <[email protected]> Date: Fri Jun 16 15:36:00 2023 +0800 chore: return a warning message when creating sink with order by (#10239) commit 558cef5 Author: zwang28 <[email protected]> Date: Fri Jun 16 13:55:08 2023 +0800 feat(frontend): support mask failed serving worker temporarily (#10328) commit 7dccfa3 Author: Bohan Zhang <[email protected]> Date: Fri Jun 16 13:03:21 2023 +0800 chore: fix kafka download path in risedev (#10363) commit 58d8562 Author: Little-Wallace <[email protected]> Date: Fri Jun 16 12:53:47 2023 +0800 fix config test Signed-off-by: Little-Wallace <[email protected]> commit e77b76b Author: Little-Wallace <[email protected]> Date: Fri Jun 16 12:21:35 2023 +0800 fix space reclaim miss Signed-off-by: Little-Wallace <[email protected]> commit 2e5a907 Author: Little-Wallace <[email protected]> Date: Fri Jun 16 11:10:19 2023 +0800 merge conflict Signed-off-by: Little-Wallace <[email protected]> commit 1af4ea1 Author: Little-Wallace <[email protected]> Date: Wed Jun 14 16:41:54 2023 +0800 do not check table size for large throughput Signed-off-by: Little-Wallace <[email protected]> commit ccc47a2 Merge: 35199c4 9d83f88 Author: Wallace <[email protected]> Date: Fri Jun 16 10:59:19 2023 +0800 Merge branch 'main' into scheduler-split commit 9d83f88 Author: idx0-dev <[email protected]> Date: Thu Jun 15 18:39:25 2023 +0800 refactor(source): unified message parser (#10096) Co-authored-by: Eric Fu <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> commit 171e212 Author: William Wen <[email protected]> Date: Thu Jun 15 16:35:33 2023 +0800 feat(pinot-demo): add demo for sink to pinot via kafka (#10294) commit 11d3092 Author: William Wen <[email protected]> Date: Thu Jun 15 16:32:54 2023 +0800 feat(java-binding): bundle jni library to jar (#10229) commit 56f4011 Author: Yuhao Su <[email protected]> Date: Thu Jun 15 16:27:28 2023 +0800 feat(metrics): add memory usage metrics for more executor (#10351) commit ea7f95b Author: Yuanxin Cao <[email protected]> Date: Thu Jun 15 15:30:31 2023 +0800 refactor(sink): prune out hidden columns within sink executor (#10276) commit d818a00 Author: Tesla Zhang <[email protected]> Date: Thu Jun 15 02:58:47 2023 -0400 refactor(plan_node_fmt): 6 more impls for Distill, refactor all `columns_name` functions (#10344) commit 26750c9 Author: xxchan <[email protected]> Date: Thu Jun 15 08:34:38 2023 +0200 build: use debug=1 back for release (#10345) commit ca41717 Author: Renjie Liu <[email protected]> Date: Thu Jun 15 14:28:25 2023 +0800 fix: Batch memory maybe negative (#10338) commit d95d3a2 Author: zwang28 <[email protected]> Date: Thu Jun 15 14:09:33 2023 +0800 chore(metric): add metric for hummock full GC (#10264) commit 65f05dd Author: StrikeW <[email protected]> Date: Thu Jun 15 13:10:06 2023 +0800 test(integration-test): jdbc sink data type tests (#10202) commit a164ab7 Author: xxchan <[email protected]> Date: Thu Jun 15 06:06:18 2023 +0200 chore: bump typos version and fix typos (#10342) commit 5cf94c9 Author: xxchan <[email protected]> Date: Wed Jun 14 16:18:53 2023 +0200 feat: support scalar function in FROM clause (#10317) commit 9593d1b Author: Tesla Zhang <[email protected]> Date: Wed Jun 14 08:40:29 2023 -0400 refactor(plan_node_fmt): 4 more impls for Distill (#10296) commit 5b38239 Author: xxchan <[email protected]> Date: Wed Jun 14 13:20:21 2023 +0200 fix: replace ouroboros with self_cell (#10316) commit 90ee868 Author: Xinjing Hu <[email protected]> Date: Wed Jun 14 19:00:27 2023 +0800 feat(expr, agg): support `PERCENTILE_CONT`, `PERCENTILE_DISC` and `MODE` aggregation (#10252) Signed-off-by: Richard Chien <[email protected]> Co-authored-by: Richard Chien <[email protected]> Co-authored-by: Noel Kwan <[email protected]> commit e3fe51b Author: congyi wang <[email protected]> Date: Wed Jun 14 17:41:39 2023 +0800 refactor(log): change `aws_credential_types::cache::lazy_caching` log level to WARN (#10333) commit 33694b1 Author: stonepage <[email protected]> Date: Wed Jun 14 17:11:22 2023 +0800 refactor(binder): bind create table (#10307) commit 02a110c Author: Noel Kwan <[email protected]> Date: Wed Jun 14 16:18:10 2023 +0800 feat(storage): support replicated `LocalHummockStorage` (#10226) commit ede3278 Author: Richard Chien <[email protected]> Date: Wed Jun 14 16:02:34 2023 +0800 refactor(common): add `MemcmpEncoded` struct to represent memcmp encoded data (#10319) Signed-off-by: Richard Chien <[email protected]> commit ff91a4a Author: Li0k <[email protected]> Date: Wed Jun 14 15:56:21 2023 +0800 refactor(storage): refactor hummock timer loop (#10164) commit 353da76 Author: Richard Chien <[email protected]> Date: Wed Jun 14 15:06:07 2023 +0800 fix(macro): support `derive(EstimateSize)` on tuple struct (#10318) Signed-off-by: Richard Chien <[email protected]> Co-authored-by: Yuhao Su <[email protected]> commit 7dd388b Author: Runji Wang <[email protected]> Date: Wed Jun 14 14:52:48 2023 +0800 doc(udf): document Java UDF (#10320) commit e4aec8b Author: xiangjinwu <[email protected]> Date: Wed Jun 14 14:15:36 2023 +0800 feat(binder): support `group by` output alias or index (#10305) commit 8eb0e43 Author: Huangjw <[email protected]> Date: Wed Jun 14 11:01:28 2023 +0800 fix(ci): fix release script (#10325) commit 86f734c Author: Shanicky Chen <[email protected]> Date: Wed Jun 14 03:45:42 2023 +0800 fix: Increase timeout for end-to-end test (parallel) (dev mode) (#10308) Co-authored-by: xxchan <[email protected]> commit e02ef6c Author: Eric Fu <[email protected]> Date: Wed Jun 14 03:42:56 2023 +0800 fix: jemalloc profiling (#10314) Co-authored-by: xxchan <[email protected]> commit 07f6b52 Author: xxchan <[email protected]> Date: Tue Jun 13 21:31:21 2023 +0200 fix: use alias as table function's column name (#10311) commit 3017aa2 Author: xxchan <[email protected]> Date: Tue Jun 13 20:58:37 2023 +0200 ci: download dependencies from s3 (#9782) commit f971965 Author: zwang28 <[email protected]> Date: Tue Jun 13 19:24:45 2023 +0800 refactor(batch): maintain serving vnode mapping in meta node (#10004) commit 2b2950d Author: Zhanxiang (Patrick) Huang <[email protected]> Date: Tue Jun 13 19:07:40 2023 +0800 refactor: replace minstant/minitrace with tokio instant/tracing (#10302) commit 9177034 Author: congyi wang <[email protected]> Date: Tue Jun 13 18:08:51 2023 +0800 feat(metrics): monitor s3 sdk retry (#9790) commit 16a0efc Author: Runji Wang <[email protected]> Date: Tue Jun 13 17:58:34 2023 +0800 feat(udf): Java UDF SDK (#10095) commit 2b2ea49 Author: Eric Fu <[email protected]> Date: Tue Jun 13 17:19:35 2023 +0800 fix(metrics): incorrect FP rate (#10300) commit 54c660b Author: lmatz <[email protected]> Date: Tue Jun 13 16:57:07 2023 +0800 chore: remove enable_stream_row_count config (#10261) commit a6f38d9 Author: Shanicky Chen <[email protected]> Date: Tue Jun 13 16:30:40 2023 +0800 feat: Add revision for rescheduling process (#10199) Signed-off-by: Shanicky Chen <[email protected]> Signed-off-by: Little-Wallace <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Part of #10225.
The temporary storage is needed to replicate upstream shared buffer changes, if the two are not on the same CN.
We add
is_replicated
flag to indicate if a local state store is replicated.In the case that it is, we will only read its read version when reading via state table.
This is because we have to consider an edge case: when the executor using replicated storage is scheduled to the same CN as the upstream mview executor. If replication happens, we will have duplicate read versions in the shared buffer.
Because local state store read (which happens for state table reads) will ONLY read its own read version, it is fine for state table.
But storage table will read from all read versions (via global state store read), which is an issue since we will have 2 copies (original + replicated).
Hence in this PR we also amend global state store read to de duplicate by ignoring ALL read version replicas.
ReadVersion
, it should follow the local options.ReadVersion
which arereplicate
s.Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note